智能论文笔记

Optimization of Artificial Neural Networks models applied to the identification of images of asteroids' resonant arguments

Valerio Carruba , Safwan Aljbaae , Gabriel Caritá , Rita Cassia Domingos , Bruno Martins

分类：机器学习

2022-07-28

小行星主带通过平均动力和世俗共振的网络越过，这在小行星和行星的基本频率之间具有相当性时发生。传统上，这些对象是通过视觉检查其共鸣论点的时间演变来识别的，它们是小行星和扰动星球的轨道元素的结合。由于在某些情况下，受这些共振影响的小行星人口是数千个的顺序，因此对于人类观察者来说，这已成为一项纳税任务。最近的作品使用卷积神经网络（CNN）模型自动执行此类任务。在这项工作中，我们将此类模型的结果与一些最先进和可公开的CNN体系结构（如VGG，Inception和Resnet）进行了比较。首先使用验证集和一系列正规化技术（例如数据扩展，辍学和批处理标准）进行测试和优化此类模型的性能。然后使用三个最佳模型来预测包含数千张图像的较大测试数据库的标签。事实证明，有和没有正规化的VGG模型是预测大型数据集标签的最有效方法。由于Vera C. Rubin天文台在未来几年内可能会发现多达四百万个新的小行星，因此这些模型的使用可能会非常有价值，以识别共鸣的次要人群。

translated by 谷歌翻译

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

Xiaohan Zhang , Xingyu Li , Waqas Sultani , Yi Zhou , Safwan Wshah

分类：计算机视觉

2022-12-08

Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks.

translated by 谷歌翻译

Visual and Object Geo-localization: A Comprehensive Survey

Daniel Wilson , Xiaohan Zhang , Waqas Sultani , Safwan Wshah

分类：计算机视觉

2021-12-30

地理定位的概念是指确定地球上的某些“实体”的位置的过程，通常使用全球定位系统（GPS）坐标。感兴趣的实体可以是图像，图像序列，视频，卫星图像，甚至图像中可见的物体。由于GPS标记媒体的大规模数据集由于智能手机和互联网而迅速变得可用，而深入学习已经上升以提高机器学习模型的性能能力，因此由于其显着影响而出现了视觉和对象地理定位的领域广泛的应用，如增强现实，机器人，自驾驶车辆，道路维护和3D重建。本文提供了对涉及图像的地理定位的全面调查，其涉及从捕获图像（图像地理定位）或图像内的地理定位对象（对象地理定位）的地理定位的综合调查。我们将提供深入的研究，包括流行算法的摘要，对所提出的数据集的描述以及性能结果的分析来说明每个字段的当前状态。

translated by 谷歌翻译

AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification

Ammarah Farooq , Muhammad Awais , Josef Kittler , Syed Safwan Khalid

分类：计算机视觉 | 机器学习

2021-01-19

跨模式的人重新识别（RE-ID）对于现代视频监视系统至关重要。关键的挑战是与一个人提供的语义信息引起的跨模式表示，并忽略背景信息。这项工作介绍了一种新型的基于卷积神经网络（CNN）的体系结构，旨在学习语义上的跨模式视觉和文本表示。基础构建块，名为Axm-block，是一个统一的多层网络，该网络会动态利用多尺度知识，并根据共享语义重新校准每种模式。为了补充卷积设计，在文本分支中应用上下文注意力以操纵长期依赖性。此外，我们提出了一种独特的设计，以增强基于视觉零件的功能连贯性和局部性信息。我们的框架具有新颖的能力，可以在功能学习阶段隐式学习模式之间的一致语义。统一的特征学习有效地利用文本数据作为视觉表示学习的超级注释信号，并自动拒绝无关的信息。整个AXM-NET经过Cuhk-Pedes数据的端到端训练。我们报告了两个任务的结果，即人搜索和跨模式重新ID。 AXM-NET优于当前最新方法（SOTA）方法，并在Cuhk-Pedes测试集上获得64.44 \％等级@1。在Crossre-ID和Cuhk-Sysu数据集中，它还胜过竞争对手的竞争对手$> $ 10 \％。

translated by 谷歌翻译

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Jeremy Irvin , Pranav Rajpurkar , Michael Ko , Yifan Yu , Silviana Ciurea-Ilcus , Chris Chute , Henrik Marklund , Behzad Haghgoo , Robyn Ball , Katie Shpanskaya

分类：

2019-01-21

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. 1

translated by 谷歌翻译